25 research outputs found

    Hack Weeks as a model for Data Science Education and Collaboration

    Full text link
    Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find that hack weeks are successful at cultivating collaboration and the exchange of knowledge. Participants self-report that these events help them both in their day-to-day research as well as their careers. Based on our results, we conclude that hack weeks present an effective, easy-to-implement, fairly low-cost tool to positively impact data analysis literacy in academic disciplines, foster collaboration and cultivate best practices.Comment: 15 pages, 2 figures, submitted to PNAS, all relevant code available at https://github.com/uwescience/HackWeek-Writeu

    Classification of Stellar Spectra with LLE

    Full text link
    We investigate the use of dimensionality reduction techniques for the classification of stellar spectra selected from the SDSS. Using local linear embedding (LLE), a technique that preserves the local (and possibly non-linear) structure within high dimensional data sets, we show that the majority of stellar spectra can be represented as a one dimensional sequence within a three dimensional space. The position along this sequence is highly correlated with spectral temperature. Deviations from this "stellar locus" are indicative of spectra with strong emission lines (including misclassified galaxies) or broad absorption lines (e.g. Carbon stars). Based on this analysis, we propose a hierarchical classification scheme using LLE that progressively identifies and classifies stellar spectra in a manner that requires no feature extraction and that can reproduce the classic MK classifications to an accuracy of one type.Comment: 15 pages, 13 figures; accepted for publication in The Astronomical Journa

    SNANA: A Public Software Package for Supernova Analysis

    Full text link
    We describe a general analysis package for supernova (SN) light curves, called SNANA, that contains a simulation, light curve fitter, and cosmology fitter. The software is designed with the primary goal of using SNe Ia as distance indicators for the determination of cosmological parameters, but it can also be used to study efficiencies for analyses of SN rates, estimate contamination from non-Ia SNe, and optimize future surveys. Several SN models are available within the same software architecture, allowing technical features such as K-corrections to be consistently used among multiple models, and thus making it easier to make detailed comparisons between models. New and improved light-curve models can be easily added. The software works with arbitrary surveys and telescopes and has already been used by several collaborations, leading to more robust and easy-to-use code. This software is not intended as a final product release, but rather it is designed to undergo continual improvements from the community as more is learned about SNe. Below we give an overview of the SNANA capabilities, as well as some of its limitations. Interested users can find software downloads and more detailed information from the manuals at http://www.sdss.org/supernova/SNANA.html .Comment: Accepted for publication in PAS

    Tests of Modified Gravity with Dwarf Galaxies

    Full text link
    In modified gravity theories that seek to explain cosmic acceleration, dwarf galaxies in low density environments can be subject to enhanced forces. The class of scalar-tensor theories, which includes f(R) gravity, predict such a force enhancement (massive galaxies like the Milky Way can evade it through a screening mechanism that protects the interior of the galaxy from this "fifth" force). We study observable deviations from GR in the disks of late-type dwarf galaxies moving under gravity. The fifth-force acts on the dark matter and HI gas disk, but not on the stellar disk owing to the self-screening of main sequence stars. We find four distinct observable effects in such disk galaxies: 1. A displacement of the stellar disk from the HI disk. 2. Warping of the stellar disk along the direction of the external force. 3. Enhancement of the rotation curve measured from the HI gas compared to that of the stellar disk. 4. Asymmetry in the rotation curve of the stellar disk. We estimate that the spatial effects can be up to 1 kpc and the rotation velocity effects about 10 km/s in infalling dwarf galaxies. Such deviations are measurable: we expect that with a careful analysis of a sample of nearby dwarf galaxies one can improve astrophysical constraints on gravity theories by over three orders of magnitude, and even solar system constraints by one order of magnitude. Thus effective tests of gravity along the lines suggested by Hui et al (2009) and Jain (2011) can be carried out with low-redshift galaxies, though care must be exercised in understanding possible complications from astrophysical effects.Comment: 26 pages, 9 figure

    API design for machine learning software: experiences from the scikit-learn project

    Get PDF
    Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library

    Scikit-learn: Machine Learning in Python

    Get PDF
    International audienceScikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net
    corecore